All right. I guess we'll get started. Hi, everyone. My name is Soen. And I'm talking
about evolving exploits through genetic algorithms. So before I jump into genetic algorithms,
though, I want to just give you a little back over of who I am. I've been with DEF CON for
many years. I do programming. I love viruses, worms, and I've been trained as a computer
scientist. And I do penetration testing in the daylight hours. But, you know, I'm still
a noob. But this talk was focused mainly off of kind of my computer science interests and
my job and my inner laziness wanting to come out. And I was looking at my job. And I go
what I do on a day-to-day basis is I exploit Web applications. And there's a number of
problems associated with, you know, performing this task.
And the major ones are it is driven by the customer. So you have to provide them what
they want. There's a small scope. You're only allowed to hit a tiny portion of the site.
So you have to have a scalpel-like efficiency. You can't hit the whole Web server with a
hammer. You only have a limited amount of time, usually very short, as in a day, two
days, three days. And it's all report-driven because it's based off of giving a report
to the customer.
And so, you know, there's a number of problems associated with that. One of them is, you
know, these problems were what has been driving me to look into this area. And there's a number
of ways that I approached trying to solve these problems. And my methodology was usually
run as many scanning tools as possible against a Web application and then manually poke at
the areas that, you know, come up as suspicious. And from there, if it does turn out to be
exploitable, I write an exploit for it. But there's a couple problems inherent with that
approach.
The code coverage is inherently small because I'm trying to limit the amount of code that
I view on a day-to-day basis. So I want to have myself view less code and make sure that
the code that I'm viewing is actually potentially vulnerable instead of just what have you.
And also the inspection of suspicious areas that are discovered by, say, Web scanners
or manually testing is also time-costy as well.
And then additionally, the development of a working exploit for a site takes time as
well because there might be additional blocking mechanisms in place like a WAF, a Web Application
Firewall, which you can see you have SQL injection, but all of a sudden you don't really
have SQL injection because there's an additional layer you have to break through.
And there's a number of really good tools out there for exploit discovery and development.
And I use Acunetix, Burp, Zap, and SQLMap very frequently, and they're all fantastic
tools. But I've realized running, you know, some of the other tools like Nessus, Nmap,
other scanning tools, that there's this problem, there's this very similarity, there's this
very big similarity.
And it's a fundamental problem with an existing industry, and it's a fundamental problem with
Web Application Scanners as we know it today.
So
Stand over there.
What up, bitches?
It's funny. He thought you were clapping for him.
He's like, well, I, you know, said SQL.
What?
Okay.
All right.
You know why we're here?
Wow.
This is the first time.
There you go.
That's what I'm talking about.
At the very back in the gray.
No, in the hoodie, man.
Bring your Skittles up here.
What's it called?
What do we call this?
Oh, what is this called?
Shut the move.
Thank you.
Oh, my God.
That was awesome.
The price is right.
You are here.
All right.
Thank you.
Thank you.
Thank you, sir.
Wait, what's your name?
Conor.
Conor.
Conor represents all of you who are first timers, and DEF CON.
Cheers.
Nice job.
Have fun.
So foundational problems.
current techniques. Sorry, that's all I knew. I think he was talking about scanning.
Oh, scanning. Scanning and software and stuff.
Oh, my God, look, he's got a countdown timer. Yep.
Oh, shit, you only have five minutes to go, dude.
Four minutes? Wow, that sucks. All good.
Well, thank you for the alcohol. I appreciate it.
You're welcome. You live to serve.
So back on track. The foundational problems that we have with web application scanners is that the current main technologies are built around a signature-based system.
They have an understanding of what a potential exploit could look like.
They throw it at the web server, and then if they retrieve a favorable or unfavorable result, they mark it as a finding.
Ugh.
And so, okay.
So I thought, you know, hey, why not take genetic algorithms and apply them to web applications?
Why not take, you know, your average basic SQL injection and go from something that a web application firewall can easily protect against
and a programmer can easily defend against
to something that is more hard to stop?
And so this whole process of evolution is something that was really fascinating to me.
And so for this talk, we're going to use genetic algorithms to make exploits for SQL injection, command injection, and our attack surfaces HTTP and HTTPS.
So it's web-based parameters.
And we're not going to cover anything else.
This could be applied to a number of different things, another JSON, AJAX, what have you.
But just for the scope of this talk, we're talking about SQLI and command injection.
So the tool I wrote for this talk is called forced evolution.
And it takes this concept of I'm going to use genetics to write exploits for me so I don't have to do it myself.
It's the easy way.
It's the inner lazy programmer.
So what is a genetic algorithm?
Well, a genetic algorithm is essentially you create a large number of things.
And in this case, they'll be exploit strings.
And you look for a certain solution that these things will provide.
And in this case, it will be an exploit.
And then you score all the strings' performance using some sort of vague, ambiguous fitness function.
And this fitness function, in our case,
we'll get into that later.
But there is a way of determining, okay, using numbers, this is a better injection string than the previous one.
And so our algorithm here is we have this loop.
While we haven't found the solution, we score.
We kill off all the low-performing strings.
We breed the strong-performing strings, the ones that are more efficient or they bypass or they exploit better.
And then we also mutate the strings randomly.
And then once we have found a correct exploit, we display it and show it.
And so the tool forced evolution does exactly this.
We create a large number of pseudo-random strings.
We are pulling upon the history of all previous, well, all that I could find, SQL injections and command injections.
And using them to influence the population of creatures that we breed.
So we're not losing evolutionary progress.
We're progressing forward.
So we create a large amount of strings.
And then we breed in what we know has worked in the past.
But we use that just to influence the population.
We don't actually say, okay, we have a set of signatures because then we're back to the original problem.
And then we go through the exact same process as a generic,
generic algorithm.
We send the string as a parameter value, either post or get, what have you.
And then use the response from the server to determine the score.
And this could be many things.
So we have a good deal of granularity on how we can score a string.
And then, you know, just like the rest, we cull, we breed, we mutate.
And then when we find a string that successfully exploits an app, we display it.
So there's a number of things that we also need to talk about.
Like, what is this fitness function?
Like, how do we define, is this string better than another string?
And there's a couple of things that we can look at and say, does it cause weird behavior?
Is the string reflected?
There might be a potential for XSS in this.
Does the string cause an error?
And if so, is our SQL injection or command injection displayed inside of that error?
That gives us additional information as well.
And also,
does the exploit string cause goal data or sensitive data to be displayed?
So that we can see, oh, potentially this is a good exploit.
So once we've found out what a creature's score is,
then we breed the top scores, and then we kill the underperforming scores.
And the majority of, well, I can't really say majority,
but a good chunk of genetic algorithms use,
use this genome crossover.
And this works really well in our domain
because we have these variable length SQL injection strings
that we need to breed against each other.
And so this breeding process consists of cutting each string in half
and then mixing halves and then mutating them.
And the current implementation that I have in the tool is
two parents create four children and also survive themselves.
So they pass on their genes,
and they also live to see another day until someone is better than them.
Now, for the next step, like, what do we mean by mutating strings?
Or mutating our exploits?
So...
Yeah, that whiskey, oof.
The mutation rate I found to be,
usually it's best to have it variable.
And there's a number of operations that we can use,
but it all boils down to three essential operations.
We have mutation, changing a single byte in a string.
We have adding information,
and we also have removing information as well.
So it's somewhat like natural evolution.
And so, say the example of the pre-mutated string ABCDE,
or ABCD, the mutations that have been applied to it
are the X has been pre-pended to the string,
the B has been deleted,
and the D has been mutated to an F.
So hopefully that'll give you some idea of what we're saying.
We're not doing anything crazy.
We're just picking a random part of the string,
and we're changing it a little way.
So that's how we mutate the strings.
Now, there's a couple things to keep in mind as we go throughout,
because we have this algorithmic process of breeding, killing, breeding, killing.
So our population is going to vary,
and the mutation rate versus search speed is very important,
because if we mutate too quickly,
if we say every single attack string that we have is going to change,
it's essentially throwing random data at the web server,
and it's really not efficient.
It's not worth doing.
It's taking a bunch of dice, throwing it in the air,
and hoping you get all sixes.
So it has to be tuned down to a point where it is efficient search.
And there's also the string cull rate versus the repopulation speed.
If you cull more than you breed,
the amount of strings in your population will decrease and vice versa.
If you repopulate too quickly,
they'll be like rabbits and they'll denial of service your own machine.
So with these things in mind,
I went ahead
and I compiled a couple of statistics on the leading edge tools.
And I did Acunetix, Burp, Zap, the OWASP Zap,
and SQL Map as well as forced evolution.
And this is just the raw data,
but I'll go through some charts to show you how it compares them.
The number of requests sent to server is a very significant amount.
Forced evolution sends on average maybe 10 to 30,000 requests
to a server.
So this is not exactly a stealth attack tool.
But we'll get into some of the pros later.
And the time to exploit is usually dependent on network latency.
And so these will fluctuate a little bit.
But forced evolution does perform well compared to some tools,
but not very well at all to others.
And the same for SQL injection.
I also did the same statistics for SQL injection.
And the total number of requests for server,
server decreases dramatically because SQL injection has a finer way
of expressing the score associated with the fitness function.
There's a better way and it's easier to score one string higher than another
because you have more information to do so.
And so it's naturally more efficient because it depends
on that fitness function, that scoring mechanism,
to determine who lives or what string lives and what string dies.
And so it reaches the solution faster.
And the time to exploit, as well, decreases proportionally.
So with that, let's go ahead and try a demo.
May the demo gods be gracious.
Because this does depend on Python import random.
So let's hope everything works.
There we go.
Okay.
Ah, this is terrible.
I'm sorry.
Okay.
So we have a generic web application here with a login form.
And it is vulnerable to SQL injection, as you can.
I'll type in just some random characters.
And it doesn't bring back correct input.
And there's also other problems with it, as well.
So we know that a vulnerability there exists.
And we can discover this vulnerability or this suspicious area,
like we talked about previously, through other scanning tools.
And now all we have to do is point forced evolution at it.
And it will go ahead and exploit it for us.
Let me see.
My VMs all of a sudden changed size.
Sorry.
There we go.
Okay.
So.
And forced evolution will be up on GitHub in about 15 minutes after the talk.
So the command line options are we have a target.
And for this we'll just do local host.
And we have an address of the vulnerable web page.
So in that case that will be sqli index.php.
And then we also have the vulnerable variable, which I believe is password.
Although I believe both would work.
And then the method.
The method previously was displayed as post.
Or, I'm sorry, get.
But the tool has both options.
And then the other variables we'll just include for completeness.
We'll just include the user name.
Typo?
I would be dangerous if I had my glasses.
Okay.
User name equals, let's just say user name.
Defcon.
And then we also have what will constitute a valid exploit.
So in this case we want to get to the administrative area of the site.
And so we'll put in our goal text will be administrative.
We'll just put admin.
Because the tool will search any request or any response that it receives back.
Parse it.
And then determine if it has that string in it.
So on the right-hand side I have a tail of the current requests coming into the web server.
So as we start running the tool, that will jump up.
Wish me luck.
Here we go.
All right.
Right now.
It has created a large number of strings.
Well, actually not that large.
It's only about a thousand.
But it's running them against the web server currently.
And it's scoring them based upon what the response it receives back.
And it's taking the top performers and then it's breeding them.
So right now we're at generation two.
Three.
Four.
Five.
Four.
Four.
Four.
Four.
And because this is based upon random strings, sometimes the solution is found extremely quickly.
And sometimes it takes a while.
But because of the influence of the previous database, it's still running.
this, this will become much, much faster.
Come on, come on.
There we go, there we go.
Alright.
Ah, let me drag this back over to my side.
So the pros and cons of using genetic algorithms.
The cons, there's a couple major ones.
This is not a very stealthy attack tool.
As you can see, this generates a large amount of requests to the web server.
And that's inherent in genetic algorithms as a whole.
And there's a small potential to inadvertently destroy the database and operating system.
So I wouldn't run this against, I wouldn't run this against a production environment.
Job security? I don't know.
Yeah, and it is a slower process to develop and test exploits.
At least from the front end.
Because I'm sure anyone in the audience, when they see that SQL injection, they write it out.
And see the program took, you know, 20, 30 seconds to do it.
And genetic algorithms will always be sub-optimal to source code analysis.
Because there's just more code coverage you can do.
But the pros, the pros for genetic algorithms and using these to create exploits are fantastic.
They're really cheap in CPU, RAM and hard drive and human time.
You can run that on a Raspberry Pi.
Your only limiting feature or factor is the network speed.
Like how far away are you from the web server.
And as far as my time goes, I can just turn it on and it runs.
I don't look at it again.
It's good.
And I feel it has more complete code coverage than other black box approaches.
Because not only does it have the signatures that the other black box approaches have.
It also isn't bound by a box of thinking.
This is someone saying this is what we know a good SQL injection to be.
It doesn't have that definition.
It's limitless in its approach to the solution.
And so that takes us to the, yeah.
Right now, this tool will break web applications in the future.
It might not do it efficiently.
But as the database of SQL exploits grows, it will do it more efficiently.
And another huge pro for this is automatic exploit development.
The, I don't have to invest my time into sitting down and figuring, oh, okay.
I got SQLite.
Oh, okay.
There's a WAF.
Oh, okay.
There's something else.
There's filtering rules.
This doesn't need to know about those.
It just cares about that question and response.
And so it's really fantastic in that regard.
And the last biggest pro for this is emergent exploit discovery.
Because since this isn't bound by what we know as, okay, this is a valid exploit.
This will create new things.
New ways of approaching problems that we haven't seen yet.
And for that reason, I think it's absolutely fantastic.
And I think we should pursue this.
So in conclusion, you can download the tool.
Give me about 15 minutes.
And there's my contact info.
So.
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